Open-source Model

# Open-source Model

Gemma 3

Gemma 3 is Google's latest open-source model, developed using research and technology from Gemini 2.0. It's a lightweight, high-performance model that runs on a single GPU or TPU, providing developers with powerful AI capabilities. Gemma 3 offers various sizes (1B, 4B, 12B, and 27B), supports over 140 languages, and boasts advanced text and visual reasoning capabilities. Its key advantages include high performance, low computational requirements, and extensive multilingual support, making it suitable for rapid AI application deployment on diverse devices. The launch of Gemma 3 aims to promote AI technology adoption and innovation, helping developers achieve efficient development across different hardware platforms.

Hibiki

Hibiki is an advanced model focusing on streaming voice translation. It generates accurate translations in real time by accumulating sufficient contextual information, supporting both voice and text translation, and facilitating voice conversion. The model is based on a multi-stream architecture, capable of simultaneously processing source and target speech, producing continuous audio streams and timestamped text translations. Its main advantages include high-fidelity voice conversion, low-latency real-time translation, and compatibility with complex reasoning strategies. Hibiki currently supports translation from French to English and is suitable for efficient real-time translation scenarios, such as international conferences and multilingual live events. The model is open-source and free, making it ideal for developers and researchers.

YuE

YuE is an open-source music generation model developed by the Hong Kong University of Science and Technology and a multimodal art projection team. It can generate full songs up to 5 minutes long, including vocals and accompaniment, based on given lyrics. The model addresses the complex issues of lyric-to-song generation through various technological innovations, such as semantic-enhanced audio taggers, dual-tagging technology, and lyrical chain thinking. The main advantages of YuE include its ability to produce high-quality musical works and support for multiple languages and music styles, offering strong scalability and controllability. The model is currently free and open-source, aimed at promoting the advancement of music generation technology.

Music Generation

MatterGen

Launched by Microsoft Research, MatterGen is a generative AI tool for material design. It can directly generate new materials with specific chemical, mechanical, electronic, or magnetic properties based on application design requirements, providing a new paradigm for material exploration. This tool is expected to accelerate the R&D process for novel materials, lower R&D costs, and play a significant role in fields such as batteries, solar cells, and CO2 adsorbents. Currently, MatterGen's source code is open-sourced on GitHub for public use and further development.

Research Equipment

Kokoro-82M

Kokoro-82M is a text-to-speech (TTS) model created by hexgrad and hosted on Hugging Face. It features 82 million parameters and is open-sourced under the Apache 2.0 license. The model released version 0.19 on December 25, 2024, offering 10 unique voice packages. Kokoro-82M ranks first in the TTS Spaces Arena, showcasing its efficiency in parameter scale and data usage. It supports both American and British English, making it suitable for generating high-quality speech output.

Allegro-TI2V

Allegro-TI2V is a text-to-image-to-video generation model that creates video content based on user-provided prompts and images. The model is recognized for its open-source nature, diverse content creation capabilities, high-quality outputs, compact efficient model parameters, and support for various precision and GPU memory optimizations. It represents cutting-edge advancements in AI technology for video generation, holding significant technical value and commercial application potential. The Allegro-TI2V model is available on the Hugging Face platform under the Apache 2.0 open-source license, allowing users to download and use it for free.

Video Production

Qwen2.5-Coder-32B-Instruct-AWQ

Qwen2.5 Coder 32B Instruct AWQ

Qwen2.5-Coder represents a series of large language models optimized for code generation, covering six mainstream model sizes with 0.5, 1.5, 3, 7, 14, and 32 billion parameters, catering to the diverse needs of developers. Qwen2.5-Coder shows significant improvements in code generation, inference, and debugging, trained on a robust Qwen2.5 backbone with a token expansion to 5.5 trillion, including source code, text grounding, and synthetic data, making it one of the most advanced open-source code LLMs, with coding capabilities comparable to GPT-4o. Additionally, Qwen2.5-Coder offers a more comprehensive foundation for applications in real-world scenarios such as code agents.

Qwen2.5-Coder-1.5B

Qwen2.5 Coder 1.5B

Qwen2.5-Coder-1.5B is a large language model in the Qwen2.5-Coder series, focusing on code generation, reasoning, and debugging. Built upon the robust Qwen2.5 architecture, this model has significantly expanded the training tokens to 5.5 trillion, incorporating source code, textual code bases, synthetic data, and more, making it a leader among open-source code LLMs, rivaling GPT-4o's coding capabilities. Moreover, Qwen2.5-Coder-1.5B has enhanced its mathematical and general capabilities, providing a more comprehensive foundation for practical applications such as code agents.

Coding Assistant

Tencent Hunyuan 3D

Tencent Hunyuan 3D

Tencent Hunyuan 3D is an open-source 3D generation model designed to address the shortcomings in generation speed and generalization capabilities of existing 3D generation models. Utilizing a two-stage generation approach, the first stage rapidly generates multi-view images using a multi-view diffusion model, while the second stage quickly reconstructs 3D assets through a feed-forward reconstruction model. The Hunyuan 3D-1.0 model aids 3D creators and artists in automating the production of 3D assets, enabling quick single-image 3D generation, and completing end-to-end production—including mesh and texture extraction—within 10 seconds.

hertz-dev

Hertz-dev is a full-duplex, audio-only transformer foundational model open-sourced by Standard Intelligence, featuring 8.5 billion parameters. This model represents scalable cross-modal learning technology capable of converting mono 16kHz speech into an 8Hz latent representation at a bitrate of 1kbps, outperforming other audio encoders. Key advantages of hertz-dev include low latency, high efficiency, and accessibility for researchers to fine-tune and build upon. Contextual information indicates that Standard Intelligence is committed to developing general intelligence that benefits humanity, with hertz-dev being a substantial step in that direction.

Model Training and Deployment

Mochi 1

Mochi 1 is an open-source video generation model introduced by Genmo as a research preview version, aiming to address fundamental issues in the current AI video landscape. The model is renowned for its unparalleled motion quality, exceptional prompt-following capabilities, and its ability to bridge the uncanny valley, generating coherent and fluid human actions and expressions. Mochi 1 was developed in response to the growing demand for high-quality video content, particularly in the gaming, film, and entertainment industries. A free trial is currently available, though detailed pricing information is not provided on the page.

Video Production

Janus

Janus is an innovative autoregressive framework that addresses the limitations of previous methods by decoupling visual encoding into distinct pathways while utilizing a single, unified transformer architecture for processing. This decoupling not only alleviates the role conflict of the visual encoder in understanding and generation but also enhances the framework's flexibility. Janus outperforms earlier unified models and matches or exceeds the performance of task-specific models. Its simplicity, high flexibility, and effectiveness make it a strong candidate for next-generation unified multimodal models.

Model Training and Deployment

CogVideoX

CogVideoX is an open-source video generation model that shares lineage with commercial models, enabling the generation of video content through textual descriptions. It represents the latest advancements in text-to-video generation technology, capable of producing high-quality videos applicable in various fields including entertainment, education, and commercial promotion.

AI Video Generation

Featured AI Tools

Flow AI

Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.

Video Production

NoCode

NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.

Development Platform

ListenHub

ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.

MiniMax Agent

MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.

Multimodal technology

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.

Image Generation

OpenMemory MCP

OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.

FastVLM

FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.

Image Processing

LiblibAI

LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase